Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
1.
2022 IEEE International Conference on Big Data, Big Data 2022 ; : 101-106, 2022.
Article in English | Scopus | ID: covidwho-2255051

ABSTRACT

The t-distributed stochastic neighbor embedding (t-SNE) is a method for interpreting high dimensional (HD) data by mapping each point to a low dimensional (LD) space (usually two-dimensional). It seeks to retain the structure of the data. An important component of the t-SNE algorithm is the initialization procedure, which begins with the random initialization of an LD vector. Points in this initial vector are then updated to minimize the loss function (the KL divergence) iteratively using gradient descent. This leads comparable points to attract one another while pushing dissimilar points apart. We believe that, by default, these algorithms should employ some form of informative initialization. Another essential component of the t-SNE is using a kernel matrix, a similarity matrix comprising the pairwise distances among the sequences. For t-SNE-based visualization, the Gaussian kernel is employed by default in the literature. However, we show that kernel selection can also play a crucial role in the performance of t-SNE.In this work, we assess the performance of t-SNE with various alternative initialization methods and kernels, using four different sets, out of which three are biological sequences (nucleotide, protein, etc.) datasets obtained from various sources, such as the well-known GISAID database for sequences of the SARS-CoV-2 virus. We perform subjective and objective assessments of these alternatives. We use the resulting t-SNE plots and k-ary neighborhood agreement (k-ANA) to evaluate and compare the proposed methods with the baselines. We show that by using different techniques, such as informed initialization and kernel matrix selection, that t-SNE performs significantly better. Moreover, we show that t-SNE also takes fewer iterations to converge faster with more intelligent initialization. © 2022 IEEE.

2.
Comput Struct Biotechnol J ; 20: 5564-5573, 2022.
Article in English | MEDLINE | ID: covidwho-2061048

ABSTRACT

Viral infections represent a major health concern worldwide. The alarming rate at which SARS-CoV-2 spreads, for example, led to a worldwide pandemic. Viruses incorporate genetic material into the host genome to hijack host cell functions such as the cell cycle and apoptosis. In these viral processes, protein-protein interactions (PPIs) play critical roles. Therefore, the identification of PPIs between humans and viruses is crucial for understanding the infection mechanism and host immune responses to viral infections and for discovering effective drugs. Experimental methods including mass spectrometry-based proteomics and yeast two-hybrid assays are widely used to identify human-virus PPIs, but these experimental methods are time-consuming, expensive, and laborious. To overcome this problem, we developed a novel computational predictor, named cross-attention PHV, by implementing two key technologies of the cross-attention mechanism and a one-dimensional convolutional neural network (1D-CNN). The cross-attention mechanisms were very effective in enhancing prediction and generalization abilities. Application of 1D-CNN to the word2vec-generated feature matrices reduced computational costs, thus extending the allowable length of protein sequences to 9000 amino acid residues. Cross-attention PHV outperformed existing state-of-the-art models using a benchmark dataset and accurately predicted PPIs for unknown viruses. Cross-attention PHV also predicted human-SARS-CoV-2 PPIs with area under the curve values >0.95. The Cross-attention PHV web server and source codes are freely available at https://kurata35.bio.kyutech.ac.jp/Cross-attention_PHV/ and https://github.com/kuratahiroyuki/Cross-Attention_PHV, respectively.

3.
Methods Mol Biol ; 2414: 433-447, 2022.
Article in English | MEDLINE | ID: covidwho-1588848

ABSTRACT

Vaccines induce a highly complex immune reaction in secondary lymphoid organs to generate immunological memory against an antigen or antigens of interest. Measurement of post immunization immune responses generated by specialized lymphocyte subsets requires time-dependent sampling, usually of the blood. Several T and B cell subsets are involved in the reaction, including CD4 and CD8 T cells, T follicular helper cells (Tfh), and germinal center B cells alongside their circulating (c) counterparts; cTfh and antibody secreting cells. Multicolor flow cytometry of peripheral blood mononuclear cells (PBMC) coupled with high-dimensional analysis offers an opportunity to study these cells in detail. Here we demonstrate a method by which such data can be generated and analysed using software that renders multidimensional data on a two dimensional map to identify rare vaccine-induced T and B cell subsets.


Subject(s)
Flow Cytometry , Leukocytes, Mononuclear , Data Analysis , T-Lymphocytes, Helper-Inducer , Vaccinology
4.
J Hematol Oncol ; 14(1): 174, 2021 10 24.
Article in English | MEDLINE | ID: covidwho-1473657

ABSTRACT

BACKGROUND: Factors affecting response to SARS-CoV-2 mRNA vaccine in allogeneic hematopoietic stem cell transplantation (allo-HCT) recipients remain to be elucidated. METHODS: Forty allo-HCT recipients were included in a study of immunization with BNT162b2 mRNA vaccine at days 0 and 21. Binding antibodies (Ab) to SARS-CoV-2 receptor binding domain (RBD) were assessed at days 0, 21, 28, and 49 while neutralizing Ab against SARS-CoV-2 wild type (NT50) were assessed at days 0 and 49. Results observed in allo-HCT patients were compared to those obtained in 40 healthy adults naive of SARS-CoV-2 infection. Flow cytometry analysis of peripheral blood cells was performed before vaccination to identify potential predictors of Ab responses. RESULTS: Three patients had detectable anti-RBD Ab before vaccination. Among the 37 SARS-CoV-2 naive patients, 20 (54%) and 32 (86%) patients had detectable anti-RBD Ab 21 days and 49 days postvaccination. Comparing anti-RBD Ab levels in allo-HCT recipients and healthy adults, we observed significantly lower anti-RBD Ab levels in allo-HCT recipients at days 21, 28 and 49. Further, 49% of allo-HCT patients versus 88% of healthy adults had detectable NT50 Ab at day 49 while allo-HCT recipients had significantly lower NT50 Ab titers than healthy adults (P = 0.0004). Ongoing moderate/severe chronic GVHD (P < 0.01) as well as rituximab administration in the year prior to vaccination (P < 0.05) correlated with low anti-RBD and NT50 Ab titers at 49 days after the first vaccination in multivariate analyses. Compared to healthy adults, allo-HCT patients without chronic GVHD or rituximab therapy had comparable anti-RBD Ab levels and NT50 Ab titers at day 49. Flow cytometry analyses before vaccination indicated that Ab responses in allo-HCT patients were strongly correlated with the number of memory B cells and of naive CD4+ T cells (r > 0.5, P < 0.01) and more weakly with the number of follicular helper T cells (r = 0.4, P = 0.01). CONCLUSIONS: Chronic GVHD and rituximab administration in allo-HCT recipients are associated with reduced Ab responses to BNT162b2 vaccination. Immunological markers could help identify allo-HCT patients at risk of poor Ab response to mRNA vaccination. TRIAL REGISTRATION: The study was registered at clinicaltrialsregister.eu on 11 March 2021 (EudractCT # 2021-000673-83).


Subject(s)
Antibodies, Neutralizing/biosynthesis , COVID-19 Vaccines/therapeutic use , Hematopoietic Stem Cell Transplantation/methods , Adult , Aged , Antibodies, Neutralizing/immunology , BNT162 Vaccine , COVID-19 Vaccines/immunology , Humans , Middle Aged , Transplantation Conditioning , Transplantation Immunology , Transplantation, Homologous
5.
Comput Biol Med ; 131: 104264, 2021 04.
Article in English | MEDLINE | ID: covidwho-1091869

ABSTRACT

Coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has a worldwide devastating effect. Understanding the evolution and transmission of SARS-CoV-2 is of paramount importance for controlling, combating and preventing COVID-19. Due to the rapid growth in both the number of SARS-CoV-2 genome sequences and the number of unique mutations, the phylogenetic analysis of SARS-CoV-2 genome isolates faces an emergent large-data challenge. We introduce a dimension-reduced K-means clustering strategy to tackle this challenge. We examine the performance and effectiveness of three dimension-reduction algorithms: principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP). By using four benchmark datasets, we found that UMAP is the best-suited technique due to its stable, reliable, and efficient performance, its ability to improve clustering accuracy, especially for large Jaccard distanced-based datasets, and its superior clustering visualization. The UMAP-assisted K-means clustering enables us to shed light on increasingly large datasets from SARS-CoV-2 genome isolates.


Subject(s)
Algorithms , COVID-19/genetics , Databases, Nucleic Acid , Genome, Viral , Mutation , Phylogeny , SARS-CoV-2/genetics , Humans
6.
Front Genet ; 11: 591833, 2020.
Article in English | MEDLINE | ID: covidwho-1052490

ABSTRACT

SARS-CoV-2 has caused a worldwide pandemic. Existing research on coronavirus mutations is based on small data sets, and multiple sequence alignment using a global-scale data set has yet to be conducted. Statistical analysis of integral mutations and global spread are necessary and could help improve primer design for nucleic acid diagnosis and vaccine development. Here, we optimized multiple sequence alignment using a conserved sequence search algorithm to align 24,768 sequences from the GISAID data set. A phylogenetic tree was constructed using the maximum likelihood (ML) method. Coronavirus subtypes were analyzed via t-SNE clustering. We performed haplotype network analysis and t-SNE clustering to analyze the coronavirus origin and spread. Overall, we identified 33 sense, 17 nonsense, 79 amino acid loss, and 4 amino acid insertion mutations in full-length open reading frames. Phylogenetic trees were successfully constructed and samples clustered into subtypes. The COVID-19 pandemic differed among countries and continents. Samples from the United States and western Europe were more diverse, and those from China and Asia mainly contained specific subtypes. Clades G/GH/GR are more likely to be the origin clades of SARS-CoV-2 compared with clades S/L/V. Conserved sequence searches can be used to segment long sequences, making large-scale multisequence alignment possible, facilitating more comprehensive gene mutation analysis. Mutation analysis of the SARS-CoV-2 can inform primer design for nucleic acid diagnosis to improve virus detection efficiency. In addition, research into the characteristics of viral spread and relationships among geographic regions can help formulate health policies and reduce the increase of imported cases.

7.
Biochem Biophys Res Commun ; 533(3): 553-558, 2020 Dec 10.
Article in English | MEDLINE | ID: covidwho-778470

ABSTRACT

Coronaviruses infect many animals, including humans, due to interspecies transmission. Three of the known human coronaviruses: MERS, SARS-CoV-1, and SARS-CoV-2, the pathogen for the COVID-19 pandemic, cause severe disease. Improved methods to predict host specificity of coronaviruses will be valuable for identifying and controlling future outbreaks. The coronavirus S protein plays a key role in host specificity by attaching the virus to receptors on the cell membrane. We analyzed 1238 spike sequences for their host specificity. Spike sequences readily segregate in t-SNE embeddings into clusters of similar hosts and/or virus species. Machine learning with SVM, Logistic Regression, Decision Tree, Random Forest gave high average accuracies, F1 scores, sensitivities and specificities of 0.95-0.99. Importantly, sites identified by Decision Tree correspond to protein regions with known biological importance. These results demonstrate that spike sequences alone can be used to predict host specificity.


Subject(s)
Computational Biology/methods , Coronavirus/pathogenicity , Host Specificity , Machine Learning , Spike Glycoprotein, Coronavirus , Animals , Humans , Spike Glycoprotein, Coronavirus/chemistry
SELECTION OF CITATIONS
SEARCH DETAIL